Modeling vowel duration for Japanese text-to-speech synthesis

نویسندگان

Jennifer J. Venditti

Jan P. H. van Santen

چکیده

Accurate estimation of segmental durations is crucial for naturalsounding text-to-speech (TTS) synthesis. This paper presents a model of vowel duration used in the Bell Labs Japanese TTS system. We describe the constraints on vowel devoicing, and effects of factors such as phone identity, surrounding phone identities, accentuation, syllabic structure, and phrasal position on the duration of both long and short vowels. A Sum-of-Products approach is used to model key interactions observed in the data, and to predict values of factor combinations not found in the speech database. We report root mean squared deviations between observed and predicted durations ranging from 8 to 15 ms, and an overall correlation of 0.89.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling segmental durations for Japanese text-to-speech synthesis

Accurate estimation of segmental durations is crucial for naturalsounding text-to-speech (TTS) synthesis. This paper presents a model of segmental duration used in the Bell Labs Japanese TTS system. We describe the constraints on vowel devoicing, and effects of factors such as phone identity, surrounding phone identities, accentuation, syllabic structure, and phrasal position on the duration of...

متن کامل

A study on automatic detection of Japanese vowel devoicing for speech synthesis

In corpus-based speech synthesis, the quality of the synthetic speech critically depends on the speech corpus. Since the high vowel in Japanese might be devoiced in the real speech, we should detect and transcribe them automatically in the corpus construction. In this paper, we apply the HMM-based method, and adopt two kinds of likelihood differences as voicing measures for different focuses. T...

متن کامل

Syllable-based acoustic modeling for Japanese spontaneous speech recognition

We study on a syllable-based acoustic modeling method for Japanese spontaneous speech recognition. Traditionally, mora-based acoustic models have been adopted for Japanese read speech recognition systems. In this paper, syllable-based unit and mora-based unit are clearly distinguished in their definition, and syllables are shown to be more suitable as an acoustic model for Japanese spontaneous ...

متن کامل

A Japanese text-to-speech system based on multi-form units with consideration of frequency distribution in Japanese

This paper proposes our new text-to-speech (TTS) system that concatenates large numbers of speech segments to produce very natural and intelligible synthetic speech. One novel point of our system is its new synthesis unit, which is has three remarkable characteristics as follows; The synthesis units contain all Japanese syllables together with all possible vowel sequences, so very smooth synthe...

متن کامل

Learning Phonemic Vowel Length from Naturalistic Recordings of Japanese Infant-Directed Speech

In Japanese, vowel duration can distinguish the meaning of words. In order for infants to learn this phonemic contrast using simple distributional analyses, there should be reliable differences in the duration of short and long vowels, and the frequency distribution of vowels must make these differences salient enough in the input. In this study, we evaluate these requirements of phonemic learn...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Modeling vowel duration for Japanese text-to-speech synthesis

نویسندگان

چکیده

منابع مشابه

Modeling segmental durations for Japanese text-to-speech synthesis

A study on automatic detection of Japanese vowel devoicing for speech synthesis

Syllable-based acoustic modeling for Japanese spontaneous speech recognition

A Japanese text-to-speech system based on multi-form units with consideration of frequency distribution in Japanese

Learning Phonemic Vowel Length from Naturalistic Recordings of Japanese Infant-Directed Speech

عنوان ژورنال:

اشتراک گذاری